A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
نویسندگان
چکیده مقاله:
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performance in Conditional Random Field-based Persian Named Entity Recognition, a several syntactic features based on dependency grammar along with some morphological and language-independent features have been designed in order to extract suitable features for the learning phase. In this implementation, designed features have been applied to Conditional Random Field to build our model. To evaluate our system, the Persian syntactic dependency Treebank with about 30,000 sentences, prepared in NOOR Islamic science computer research center, has been implemented. This Treebank has Named-Entity tags, such as Person, Organization and location. The result of this study showed that our approach achieved 86.86% precision, 80.29% recall and 83.44% F-measure which are relatively higher than those values reported for other Persian NER methods.
منابع مشابه
Named Entity Recognition in Bengali: A Conditional Random Field Approach
This paper reports about the development of a Named Entity Recognition (NER) system for Bengali using the statistical Conditional Random Fields (CRFs). The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the various named entity (NE) classes. A portion of the partially NE tagged Bengali news corpus, develope...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملNamed entity recognition without domain-knowledge using conditional random fields
This paper addresses the problem of not using any domain-knowledge in named entity recognition (NER) tasks. Experiments on two well-known datasets show that the currently mostly used technique – conditional random fields (CRF) – achieves results which are respectable. It is discussed if it is acceptable to pass on better results to get results in a faster and modular way. 1. Conditional Random ...
متن کاملDisease Named Entity Recognition Using Conditional Random Fields
Named Entity Recognition is a crucial component in bio-medical text mining.In this paper a method for disease Named Entity Recognition is proposed which utilizes sentence and token level features based on Conditional Random Field’s using NCBI disease corpus. The feature set used for the experiment includes orthographic,contextual,affixes,ngrams,part of speech tags and word normalization.Using t...
متن کاملArabic Named Entity Recognition using Conditional Random Fields
The Named Entity Recognition (NER) task consists in determining and classifying proper names within an open-domain text. This Natural Language Processing task proved to be harder for languages with a complex morphology such as the Arabic language. NER was also proved to help Natural Language Processing tasks such as Machine Translation, Information Retrieval and Question Answering to obtain a h...
متن کاملPersoNER: Persian Named-Entity Recognition
Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present ...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 8 شماره 2
صفحات 227- 236
تاریخ انتشار 2020-04-01
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023